248 research outputs found

    Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

    Get PDF
    We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore, RankBoost, when given a specific intercept, achieves a misclassification error that is as good as AdaBoost’s. This may help to explain the empirical observations made by Cortes andMohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve

    The Rate of Convergence of AdaBoost

    Get PDF
    The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that at iteration tt, the exponential loss of AdaBoost's computed parameter vector will be at most ϵ\epsilon more than that of any parameter vector of ℓ1\ell_1-norm bounded by BB in a number of rounds that is at most a polynomial in BB and 1/ϵ1/\epsilon. We also provide lower bounds showing that a polynomial dependence on these parameters is necessary. Our second result is that within C/ϵC/\epsilon iterations, AdaBoost achieves a value of the exponential loss that is at most ϵ\epsilon more than the best possible value, where CC depends on the dataset. We show that this dependence of the rate on ϵ\epsilon is optimal up to constant factors, i.e., at least Ω(1/ϵ)\Omega(1/\epsilon) rounds are necessary to achieve within ϵ\epsilon of the optimal exponential loss.Comment: A preliminary version will appear in COLT 201

    Generalization bounds for averaged classifiers

    Full text link
    We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005
    • …
    corecore